Takehome_Ex03

Take-home Exercise 3: Predicting HDB Public Housing Resale Pricies using Geographically Weighted Methods

Background

Housing is an essential component of household wealth worldwide. Buying a housing has always been a major investment for most people. The price of housing is affected by many factors. Some of them are global in nature such as the general economy of a country or inflation rate. Others can be more specific to the properties themselves. These factors can be further divided to structural and locational factors. Structural factors are variables related to the property themselves such as the size, fitting, and tenure of the property. Locational factors are variables related to the neighbourhood of the properties such as proximity to childcare centre, public transport service and shopping centre.

Conventional, housing resale prices predictive models were built by using Ordinary Least Square (OLS) method. However, this method failed to take into consideration that spatial autocorrelation and spatial heterogeneity exist in geographic data sets such as housing transactions. With the existence of spatial autocorrelation, the OLS estimation of predictive housing resale pricing models could lead to biased, inconsistent, or inefficient results (Anselin 1998). In view of this limitation, Geographical Weighted Models were introduced for calibrating predictive model for housing resale prices.

Objective

To predict HDB resale prices at the sub-market level for the month of January and February 2023 in Singapore.The predictive models will be built by using by using conventional OLS method and GWR methods. The objective is to compare the performance of the conventional OLS method versus the geographical weighted methods.

Concept for GWR, Hedonic pricing model

Geographically weighted regression (GWR) is a spatial statistical technique that takes non-stationary variables into consideration (e.g., climate; demographic factors; physical environment characteristics)

In this take home assignment, we need to build predictive model using GWR. The predictive model need to take consideration into locational factors such as proximity to childcare centre, public transport service and shopping centre.

Hence, we have to first identify the relevant location factors for this assignment, which including:

  • Proximity to CBD

  • Proximity to eldercare

  • Proximity to foodcourt/hawker centres

  • Proximity to MRT

  • Proximity to park

  • Proximity to good primary school

  • Proximity to shopping mall

  • Proximity to supermarket

  • Numbers of kindergartens within 350m

  • Numbers of childcare centres within 350m

  • Numbers of bus stop within 350m

  • Numbers of primary school within 1km

Steps

Load Necessary Library

pacman::p_load(olsrr, corrplot, ggpubr, sf, spdep, GWmodel, tmap, tidyverse, gtsummary,readxl,jsonlite,rvest)
Installing package into 'C:/Users/65917/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)
also installing the dependencies 'rlang', 'purrr', 'vctrs', 'tidyr'
Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.2:
  cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.2/PACKAGES'
package 'rlang' successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package 'rlang'
Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
C:\Users\65917\AppData\Local\R\win-library\4.2\00LOCK\rlang\libs\x64\rlang.dll
to C:\Users\65917\AppData\Local\R\win-library\4.2\rlang\libs\x64\rlang.dll:
Permission denied
Warning: restored 'rlang'
package 'purrr' successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package 'purrr'
Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
C:\Users\65917\AppData\Local\R\win-library\4.2\00LOCK\purrr\libs\x64\purrr.dll
to C:\Users\65917\AppData\Local\R\win-library\4.2\purrr\libs\x64\purrr.dll:
Permission denied
Warning: restored 'purrr'
package 'vctrs' successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package 'vctrs'
Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
C:\Users\65917\AppData\Local\R\win-library\4.2\00LOCK\vctrs\libs\x64\vctrs.dll
to C:\Users\65917\AppData\Local\R\win-library\4.2\vctrs\libs\x64\vctrs.dll:
Permission denied
Warning: restored 'vctrs'
package 'tidyr' successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package 'tidyr'
Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
C:\Users\65917\AppData\Local\R\win-library\4.2\00LOCK\tidyr\libs\x64\tidyr.dll
to C:\Users\65917\AppData\Local\R\win-library\4.2\tidyr\libs\x64\tidyr.dll:
Permission denied
Warning: restored 'tidyr'
package 'ggpubr' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\65917\AppData\Local\Temp\RtmpklI3I7\downloaded_packages

ggpubr installed
Warning in pacman::p_load(olsrr, corrplot, ggpubr, sf, spdep, GWmodel, tmap, : Failed to install/load:
ggpubr

Importing geospatial data (MP2019)

mpsz2019 = st_read(dsn = "data/geospatial", layer = "MPSZ-2019")
Reading layer `MPSZ-2019' from data source 
  `C:\Quanfang777\IS415-GAA\Takehome_Exercise\Takehome_Exercise3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

Updating CRS information

mpsz_svy21 <- st_transform(mpsz2019, 3414)
st_crs(mpsz_svy21)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Check invalid geometries

length(which(st_is_valid(mpsz_svy21) == FALSE))
[1] 6

Make the invalid geometries valid

mpsz_svy21 <- st_make_valid(mpsz_svy21)
length(which(st_is_valid(mpsz_svy21) == FALSE))
[1] 0

Aspatial Data Wrangling

Importing the aspatial data

resale = read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
Rows: 148277 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): month, town, flat_type, block, street_name, storey_range, flat_mode...
dbl (3): floor_area_sqm, lease_commence_date, resale_price

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

We plan to look at four-room flat and a transaction period between1st January 2021 to 31st December 2022.

resale <- resale %>% 
  filter(flat_type == "4 ROOM") %>%
  filter(month >= "2021-01" & month < "2023-01")
resale
# A tibble: 23,656 × 11
   month   town    flat_…¹ block stree…² store…³ floor…⁴ flat_…⁵ lease…⁶ remai…⁷
   <chr>   <chr>   <chr>   <chr> <chr>   <chr>     <dbl> <chr>     <dbl> <chr>  
 1 2021-01 ANG MO… 4 ROOM  547   ANG MO… 04 TO …      92 New Ge…    1981 59 yea…
 2 2021-01 ANG MO… 4 ROOM  414   ANG MO… 01 TO …      92 New Ge…    1979 57 yea…
 3 2021-01 ANG MO… 4 ROOM  509   ANG MO… 01 TO …      91 New Ge…    1980 58 yea…
 4 2021-01 ANG MO… 4 ROOM  467   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
 5 2021-01 ANG MO… 4 ROOM  571   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
 6 2021-01 ANG MO… 4 ROOM  134   ANG MO… 07 TO …      98 New Ge…    1978 56 yea…
 7 2021-01 ANG MO… 4 ROOM  204   ANG MO… 07 TO …      92 New Ge…    1977 55 yea…
 8 2021-01 ANG MO… 4 ROOM  429   ANG MO… 04 TO …      92 New Ge…    1978 56 yea…
 9 2021-01 ANG MO… 4 ROOM  413   ANG MO… 10 TO …      92 New Ge…    1979 57 yea…
10 2021-01 ANG MO… 4 ROOM  415   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
# … with 23,646 more rows, 1 more variable: resale_price <dbl>, and abbreviated
#   variable names ¹​flat_type, ²​street_name, ³​storey_range, ⁴​floor_area_sqm,
#   ⁵​flat_model, ⁶​lease_commence_date, ⁷​remaining_lease

However, when we look at the code, there is no Longitude and Latitude Information, which means we need to geocode it, to make sure we get accurate geocode result, we have to combine the Block and Street Name as input:

resale$street_name <- gsub("ST\\.", "SAINT", resale$street_name)
# 
# library(httr)
# geocode <- function(block, streetname) {
#   base_url <- "https://developers.onemap.sg/commonapi/search"
#   address <- paste(block, streetname, sep = " ")
#   query <- list("searchVal" = address, 
#                 "returnGeom" = "Y",
#                 "getAddrDetails" = "N",
#                 "pageNum" = "1")
#   
#   res <- GET(base_url, query = query)
#   restext<-content(res, as="text")
#   
#   output <- fromJSON(restext)  %>% 
#     as.data.frame %>%
#     select(results.LATITUDE, results.LONGITUDE)
# 
#   return(output)
# }
# resale$LATITUDE <- 0
# resale$LONGITUDE <- 0
# 
# for (i in 1:nrow(resale)){
#   temp_output <- geocode(resale[i, 4], resale[i, 5])
#   
#   resale$LATITUDE[i] <- temp_output$results.LATITUDE
#   resale$LONGITUDE[i] <- temp_output$results.LONGITUDE}

Make a copy for the geocoded data:

#write_csv(resale, "data/rds/resalecopy1.csv")

Converting aspatial data frame into a sf object

resale = read_csv("data/rds/resalecopy1.csv")
Rows: 23656 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): month, town, flat_type, block, street_name, storey_range, flat_mode...
dbl (5): floor_area_sqm, lease_commence_date, resale_price, LATITUDE, LONGITUDE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
view(resale)
resale.sf <- st_as_sf(resale,
                            coords = c("LONGITUDE", "LATITUDE"),
                            crs=4326) %>%
  st_transform(crs=3414)
rs_transform <- resale.sf %>%
  mutate(resale.sf, address = paste(block,street_name)) %>%
  mutate(resale.sf, remaining_lease_yr = as.integer(str_sub(remaining_lease, 0, 2))) %>%
  mutate(resale.sf, remaining_lease_mth = as.integer(str_sub(remaining_lease, 9, 11)))
add_list <- sort(unique(rs_transform$address))

Locational Factor

CBD

As the proximity to CBD is one of the locational factor we interested in to improve our predicted model, let’s take the coordinates of Downtown core to be the coordinates of CBD

lat <- 1.287953
lng <- 103.851784

cbd_sf <- data.frame(lat, lng) %>%
  st_as_sf(coords = c("lng", "lat"), crs=4326) %>%
  st_transform(crs=3414)

Eldercare

Eldercare_sf <- st_read(dsn = "data/geospatial/eldercare-services-shp", 
                layer = "ELDERCARE")
Reading layer `ELDERCARE' from data source 
  `C:\Quanfang777\IS415-GAA\Takehome_Exercise\Takehome_Exercise3\data\geospatial\eldercare-services-shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 133 features and 18 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21
st_crs(Eldercare_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
Eldercare_sf <- st_transform(Eldercare_sf, crs=3414)

Kindergartens

library(sf)
library(onemapsgapi)

token <- "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOjEwMDUyLCJ1c2VyX2lkIjoxMDA1MiwiZW1haWwiOiJ0YWtvdGFrb3YwMDBAZ21haWwuY29tIiwiZm9yZXZlciI6ZmFsc2UsImlzcyI6Imh0dHA6XC9cL29tMi5kZmUub25lbWFwLnNnXC9hcGlcL3YyXC91c2VyXC9zZXNzaW9uIiwiaWF0IjoxNjc5MDQ0NzE3LCJleHAiOjE2Nzk0NzY3MTcsIm5iZiI6MTY3OTA0NDcxNywianRpIjoiOTdlYzA1NDk2YTJlNGMwMWUyNDFiMGRlMDgyOWY5YjMifQ.6vnUA55we0IJa_-I5eNv2o8MCrc0hVpxmXnBzMfhSDM"
search_themes(token, "kindergartens") 
# A tibble: 1 × 5
  THEMENAME     QUERYNAME     ICON       CATEGORY  THEME_OWNER                  
  <chr>         <chr>         <chr>      <chr>     <chr>                        
1 Kindergartens kindergartens school.gif Education EARLY CHILDHOOD DEVELOPMENT …
Kindergartens_tibble <- get_theme(token, "kindergartens")
Kindergartens_sf <- st_as_sf(Kindergartens_tibble, coords=c("Lng", "Lat"), crs=4326)
Kindergartens_sf <- st_transform(Kindergartens_sf, crs=3414)

Childcare Center

search_themes(token, "childcare")
# A tibble: 1 × 5
  THEMENAME           QUERYNAME ICON                    CATEGORY THEME_OWNER    
  <chr>               <chr>     <chr>                   <chr>    <chr>          
1 Child Care Services childcare onemap-fc-childcare.png Family   EARLY CHILDHOO…
library(sf)
library(onemapsgapi)

themetibble <- get_theme(token, "childcare")
childcaresf <- st_as_sf(themetibble, coords=c("Lng", "Lat"), crs=4326)
childcaresf <- st_transform(childcaresf, crs=3414)

Park

search_themes(token, "parks")
# A tibble: 25 × 5
   THEMENAME                                       QUERY…¹ ICON  CATEG…² THEME…³
   <chr>                                           <chr>   <chr> <chr>   <chr>  
 1 NParks BBQ Pits                                 nparks… null  null    NATION…
 2 Tree Conservation Area                          tree_c… null  null    NATION…
 3 Under-Construction Park Connectors              nparks… null  null    NATION…
 4 Singapore Coastal Habitats Map from High Resol… singap… null  null    NATION…
 5 Centralised Grasscutting Area                   centra… unti… Enviro… NATION…
 6 Nature Reserves Gazette 2005                    nr_gaz… gaze… Recrea… NATION…
 7 Heritage Trees                                  herita… tree… Enviro… NATION…
 8 NParks Activity Area                            nparks… null  null    NATION…
 9 Under-Construction Parks                        nparks… null  null    NATION…
10 Prohibited Drone flying areas at NParks Nature… drone_… LOGG… Recrea… NATION…
# … with 15 more rows, and abbreviated variable names ¹​QUERYNAME, ²​CATEGORY,
#   ³​THEME_OWNER
library(sf)
library(onemapsgapi)
Parks_themetibble <- get_theme(token, "nationalparks")
Parks_sf <- st_as_sf(Parks_themetibble, coords=c("Lng", "Lat"), crs=4326)
Parks_sf <- st_transform(Parks_sf, crs=3414)

Shopping mall

Get Call function

get_coords <- function(add_list){
  
  # Create a data frame to store all retrieved coordinates
  postal_coords <- data.frame()
    
  for (i in add_list){
    #print(i)

    r <- GET('https://developers.onemap.sg/commonapi/search?',
           query=list(searchVal=i,
                     returnGeom='Y',
                     getAddrDetails='Y'))
    data <- fromJSON(rawToChar(r$content))
    found <- data$found
    res <- data$results
    
    # Create a new data frame for each address
    new_row <- data.frame()
    
    # If single result, append 
    if (found == 1){
      postal <- res$POSTAL 
      lat <- res$LATITUDE
      lng <- res$LONGITUDE
      new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
    }
    
    # If multiple results, drop NIL and append top 1
    else if (found > 1){
      # Remove those with NIL as postal
      res_sub <- res[res$POSTAL != "NIL", ]
      
      # Set as NA first if no Postal
      if (nrow(res_sub) == 0) {
          new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
      }
      
      else{
        top1 <- head(res_sub, n = 1)
        postal <- top1$POSTAL 
        lat <- top1$LATITUDE
        lng <- top1$LONGITUDE
        new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
      }
    }

    else {
      new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
    }
    
    # Add the row
    postal_coords <- rbind(postal_coords, new_row)
  }
  return(postal_coords)
}
url <- "https://en.wikipedia.org/wiki/List_of_shopping_malls_in_Singapore"
malls_list <- list()

for (i in 2:7){
  malls <- read_html(url) %>%
    html_nodes(xpath = paste('//*[@id="mw-content-text"]/div[1]/div[',as.character(i),']/ul/li',sep="") ) %>%
    html_text()
  malls_list <- append(malls_list, malls)
}
library(httr)
malls_list_coords <- get_coords(malls_list) %>% 
  rename("mall_name" = "address")
malls_list_coords <- subset(malls_list_coords, mall_name!= "Yew Tee Shopping Centre")
invalid_malls<- subset(malls_list_coords, is.na(malls_list_coords$postal))
invalid_malls_list <- unique(invalid_malls$mall_name)
corrected_malls <- c("Clarke Quay", "City Gate", "Raffles Holland V", "Knightsbridge", "Mustafa Centre", "GR.ID", "Shaw House",
                     "The Poiz Centre", "Velocity @ Novena Square", "Singapore Post Centre", "PLQ Mall", "KINEX", "The Grandstand")

for (i in 1:length(invalid_malls_list)) {
  malls_list_coords <- malls_list_coords %>% 
    mutate(mall_name = ifelse(as.character(mall_name) == invalid_malls_list[i], corrected_malls[i], as.character(mall_name)))
}
malls_list <- sort(unique(malls_list_coords$mall_name))
malls_coords <- get_coords(malls_list)
malls_coords[(is.na(malls_coords$postal) | is.na(malls_coords$latitude) | is.na(malls_coords$longitude)), ]
[1] address   postal    latitude  longitude
<0 rows> (or 0-length row.names)
malls_sf <- st_as_sf(malls_coords,
                    coords = c("longitude", 
                               "latitude"),
                    crs=4326) %>%
  st_transform(crs = 3414)

Good Primary School

pri_sch <- read_csv("data/geospatial/general-information-of-schools.csv")
Rows: 346 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (31): school_name, url_address, address, postal_code, telephone_no, tele...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pri_sch <- pri_sch %>%
  filter(mainlevel_code == "PRIMARY"| mainlevel_code == "MIXED LEVELS") %>%
  select(school_name, address, postal_code, mainlevel_code)
prisch_list <- sort(unique(pri_sch$postal_code))
prisch_coords <- get_coords(prisch_list)
prisch_coords[(is.na(prisch_coords$postal) | is.na(prisch_coords$latitude) | is.na(prisch_coords$longitude)), ]
[1] address   postal    latitude  longitude
<0 rows> (or 0-length row.names)
prisch_coords = prisch_coords[c("postal","latitude", "longitude")]
pri_sch <- left_join(pri_sch, prisch_coords, by = c('postal_code' = 'postal'))
prisch_sf <- st_as_sf(pri_sch,
                    coords = c("longitude", 
                               "latitude"),
                    crs=4326) %>%
  st_transform(crs = 3414)

Select top 10 school

postal_codes <- c(599986, 449761, 597610, 536451, 579767, 128806, 569405, 738907, 579646, 227988)

# Filter prisch_sf for rows with postal code in the vector
top10prisch_sf <- prisch_sf %>%
  filter(postal_code %in% postal_codes)
top10prisch_sf
Simple feature collection with 10 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 19962.23 ymin: 31957.47 xmax: 36706.79 ymax: 47144.77
Projected CRS: SVY21 / Singapore TM
# A tibble: 10 × 5
   school_name                 address posta…¹ mainl…²            geometry
 * <chr>                       <chr>   <chr>   <chr>           <POINT [m]>
 1 ADMIRALTY PRIMARY SCHOOL    11   W… 738907  PRIMARY (24296.63 47144.77)
 2 AI TONG SCHOOL              100  B… 579646  PRIMARY (27966.81 38071.92)
 3 ANGLO-CHINESE SCHOOL (JUNI… 16   W… 227988  PRIMARY (28916.32 32403.75)
 4 CATHOLIC HIGH SCHOOL        9    B… 579767  MIXED … (29286.06 37423.91)
 5 CHIJ ST. NICHOLAS GIRLS' S… 501  A… 569405  MIXED … (28104.02 39497.61)
 6 HOLY INNOCENTS' PRIMARY SC… 5    L… 536451  PRIMARY  (34765.9 38774.69)
 7 METHODIST GIRLS' SCHOOL (P… 11   B… 599986  PRIMARY (22442.59 34984.58)
 8 NAN HUA PRIMARY SCHOOL      30   J… 128806  PRIMARY (19962.23 33496.24)
 9 PEI HWA PRESBYTERIAN PRIMA… 7    P… 597610  PRIMARY (21648.98 35582.91)
10 TAO NAN SCHOOL              49   M… 449761  PRIMARY (36706.79 31957.47)
# … with abbreviated variable names ¹​postal_code, ²​mainlevel_code

Hawker Center

hawker_sf <- st_read("data/geospatial/hawker-centres-kml.kml") 
Reading layer `HAWKERCENTRE' from data source 
  `C:\Quanfang777\IS415-GAA\Takehome_Exercise\Takehome_Exercise3\data\geospatial\hawker-centres-kml.kml' 
  using driver `KML'
Simple feature collection with 125 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6974 ymin: 1.272716 xmax: 103.9882 ymax: 1.449217
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
hawker_sf <- st_transform(hawker_sf, crs=3414)
st_crs(hawker_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Bus Stop

bus_sf <- st_read(dsn="data/geospatial/BusStop_Feb2023", layer="BusStop")
Reading layer `BusStop' from data source 
  `C:\Quanfang777\IS415-GAA\Takehome_Exercise\Takehome_Exercise3\data\geospatial\BusStop_Feb2023' 
  using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
bus_sf <- st_transform(bus_sf, crs=3414)

Mrt

rail_network_sf <- st_read(dsn="data/geospatial/TrainStation_Feb2023", layer="RapidTransitSystemStation")
Reading layer `RapidTransitSystemStation' from data source 
  `C:\Quanfang777\IS415-GAA\Takehome_Exercise\Takehome_Exercise3\data\geospatial\TrainStation_Feb2023' 
  using driver `ESRI Shapefile'
Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
GDAL Message 1: Non closed ring detected. To avoid accepting it, set the
OGR_GEOMETRY_ACCEPT_UNCLOSED_RING configuration option to NO
Simple feature collection with 220 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 6068.209 ymin: 27478.44 xmax: 45377.5 ymax: 47913.58
Projected CRS: SVY21
rail_network_sf<- st_transform(rail_network_sf, crs=3414)

Check invalid geometry for locational factor

length(which(st_is_valid(bus_sf) == FALSE))
[1] 0
length(which(st_is_valid(rail_network_sf) == FALSE))
[1] 2
length(which(st_is_valid(childcaresf) == FALSE))
[1] 0
length(which(st_is_valid(Eldercare_sf) == FALSE))
[1] 0
length(which(st_is_valid(cbd_sf)))
[1] 1
cbd_sf <- st_make_valid(cbd_sf)
length(which(st_is_valid(cbd_sf) == FALSE))
[1] 0
length(which(st_is_valid(childcaresf) == FALSE))
[1] 0
length(which(st_is_valid(Kindergartens_sf) == FALSE))
[1] 0
length(which(st_is_valid(malls_sf) == FALSE))
[1] 0
length(which(st_is_valid(Parks_sf) == FALSE))
[1] 0
length(which(st_is_valid(prisch_sf) == FALSE))
[1] 0
length(which(st_is_valid(top10prisch_sf) == FALSE))
[1] 0
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(hawker_sf) +
  tm_dots(alpha=0.5, #affects transparency of points
          col="#d62828",
          size=0.05) +
tm_shape(Parks_sf) +
  tm_dots(alpha=0.5,
          col="#f77f00",
          size=0.05) +
tm_shape(malls_sf) +
  tm_dots(alpha=0.5,
          col="#eae2b7",
          size=0.05)
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(childcaresf) +
  tm_dots(alpha=0.5, #affects transparency of points
          col="#2ec4b6",
          size=0.05) +
tm_shape(Eldercare_sf) +
  tm_dots(alpha=0.5,
          col="#e71d36",
          size=0.05) +
tm_shape(Kindergartens_sf) +
  tm_dots(alpha=0.5,
          col="#ff9f1c",
          size=0.05) +
tm_shape(top10prisch_sf) +
  tm_dots(alpha=0.5,
        col="#f4acb7",
        size=0.05)

Structural Factors

Exploratory Data Analysis (EDA)

ggplot(data= resale.sf, aes(x=`resale_price`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

The figure above reveals a right skewed distribution. This means that more condominium units were transacted at relative lower price, and we need to normalize it by using log transformation

resale.sf <- resale.sf %>%
  mutate(`log_resale_price` = log(resale_price))
ggplot(data=resale.sf, aes(x=`log_resale_price`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

Drawing Statistical Point Map

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(mpsz_svy21)+
  tm_polygons() +
tm_shape(resale.sf) +  
  tm_dots(col = "resale_price",
          alpha = 0.6,
          style="quantile") +
  tm_view(set.zoom.limits = c(11,14))

Hedonic Pricing Modelling

Simple Linear Regression Method